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ABSTRACT 

In  this  paper  we  show  that  there  is  a close  relationship  between  variable 
metric  methods  of  function  minimization  and  filtering  of  linear  stochastic  systems 
with  disturbances  which  are  modelled  as  unknown  but  bounded  functions.  We  develop 
new  variable  metric  algorithms  for  function  minimization. 

1.  INTRODUCTION 

The  objective  of  this  paper  is  to  show  that  there  is  a close  relationship 
between  variable  metric  methods  of  function  minimization  and  filtering  of  linear 
stochastic  systems  with  disturbances  which  are  modelled  as  unknown  but  bounded 
functions . 

It  is  well  known  that  Newton's  method  for  function  minimization  exhibits 
quadratic  convergence  in  the  neighborhood  of  the  minimum.  This  rapid  convergence 
rate  however  is  obtained  at  the  expense  of  requirin^second  derivative  computations 
and  solution  of  a linear  equation  at  each  iteration  stage.  On  the  other  hand, 
variable  metric  methods  do  not  require  second  derivative  computations  nor  matrix 
inversion  (solution  of  a linear  equation)  and  versions  of  this  algorithm  are  known 
to  exhibit  reasonably  rapid  convergence.  Intuitively,  one  nay  consider  a variable 
metric  method  as  one  where  an  estimate  of  the  Hessian  (or  inverse  of  a Hessian)  is 
obtained  on  the  basis  of  information  on  function  values  and  gradient  values  in  past 
iterations  and  the  next  step  is  determined  on  the  basis  of  this  estimate.  In  this 
paper,  we  attempt  to  make  this  intuitive  notion  precise. 

The  work  closest  in  spirit  to  this  work  is  the  doctoral  dissertation  of  THOMAS 
(4).  The  stochastic  models  we  derive  are  however,  somewhat  different  and  we  exploit 
linear  filtering  theory  to  the  fullest  extent  possible.  Ke  obtain  algorithms  which 
do  not  require  accurate  line  search  algorithms  as  was  also  done  by  Thomas. 


2.  FILTERING  MODEL  FOR  THE  ALGORITHM 


Consider  the  proble 
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{ f (x)  I xe  IR11}  , where 


(2.1) 


2 


f is  assumed  to  be  thrice  continuously  differentiable  or.  R 


Vf(x)  = g (x)  and  D f(x)  * G(x) 


(2.2) 


Let  x*  be  a local  minimum  of  f and  in  some  open,  convex  neighborhood  D of 


x*  , let  us  assume 

j |G(xk  + eiS]c)  - G(xk  + 02sk)  | | < L^  - e2l  • | I *k!  I ' where  L > 0 


(2.3> 


for  all  x^Xj^  + SjjC®  • 0^,02e[O,l]. 

We  wish  to  discuss  iterative  algorithms  for  minimizing  f(x)  and  the  algorithm 
proceeds  as  Xj^  ° + sk  • k * 0,1,2,... 

Let  us  use  the  notation 


<^(0)  = G(xk  + 0sk) 


gk(0)  = 9(xk  + 0s^)  , k B 0,1,2,... 


It  is  easy  to  see  that  there  exists  U.eL1  (0,li£(R") ) such  that 


(2.4) 


Gk<e)  ■ s*0*  = / uk<t)dt  ' with 


| Uk ( 0 ) 1 1 £ l| | sk ||  , k = 0,1,2, ...,3£[0,1) 


(2.5) 


9^(0)  - gv(o)  + / Gk(t)skdt 


(2.6) 


Evaluating  (2.5)  and  (2.6)  at  0 = 1,  and  using  the  natural  notation  <^(1) 
Gk+1  ' Gk(0>  " Gk  ' gk(1)  " gk+l  ' etC*'  We  g6t 


<w  - <W  Vt,dt 

0 


+ G s + / [G  (t)  - G (0))s  dt 


9k+l  “ 9k  + °kSk  + J 


(2.8) 


78  06  2 7 067 


3 


Let 


V.  - I 


k i “k' 
0 


Wk  " / [Gk(t)  " Gk(0)5skd- 
0 


Then,  we  may  rewrite  (2.8)  as 


\ * v* 


’k+l 


% * Vk  + wk 


It  is  natural  to  think  of  and  as  process  and  observation  noise  xf 

pectively.  They  are  obviously  correlated.  We  now  attempt  to  bound  the  noise. 

3.  BOUNDS  ON  THE  NOISE 


To  do  the  bounding,  we  use  the  following  device:  Let  y^  denote  the  i» 
of  G . We  then  use  the  isomorphism 

>i 

I 

i : -*•  Rn  : Gt-»- 

We  can  then  rewrite  equation  (2.8)  in  differential  form: 


row 


■x  (iV 


X \ (6) 


iot(e> 


(2.10) 


(I. 


s^)(iGi(0)) 


In  the  above  * denotes  transpose  and  ft  denotes  tensor  product.  Writing  (2.10) 
in  vector-matrix  form: 


_d_ 

d0 


iGk(0) 


gk(0) 


O\/iGk(0) 


0/\gk(9) 


iUk(5)l 


(2.11) 


We  are  interested  in  bounding  Vk  and  wk  as  (• ) varies  over  the  class  of 
all  mappings  given  by  (2.5).  Clearly,  the  set  of  all  (iV*k,wk)  as  Uk(.)  varies 
is  a convex  set  in  Rn  + Rn  . Let  denote  the  set.  We  can  compute  the  support 

function  of  this  set  and  estimate  that  the  support  function  nk(G*,g*)»  G*£yQRn)*, 


4 


g*e(R  )*  (*  denotes  the  dual  space)  satisfies: 

nk<=*.g*>  < MlskIKT  llskll?l|9*||2  * (g*.=V 


I ! G* ' ! 2 1 

i . i ; ■» 


1/2 


(2.12) 


It  is  easy  to  see  that  an  appropriate  choice  of  1^(0  in  the  class  defined  by 
(2.5)  attains  this  bound  and  hence  the  support  function  can  be  computed  as: 

1/2 


ik «=*'**>  -*IIMI«Q  <%(“,)» 


(2.13) 


where  <•»•>  is  the  obvious  inner  product  in  jMRn)  x R.n  and  in  the  matrix 

defined  from  the  right  hand  side  of  (2.12).  We  can  check  that  Qk  > 0 (unless 

sk  - 0). 

The  above  discussions  may  be  combined  in  the  following: 

Consider  the  problem  of  estimating  from 


Vi  - G’  + v’ 


"k  r ’k 


z,  = G,  s,  + w,  , where  z,  *»  g,  . , - g, 
k k k k k ^k+l  3k 


(2.14) 


(2.15) 


Let  G eft  where 
o o 


ftc  *=  (Ge<^(iRn)  | <G  - Gq,  “1(G  - Gq)  > 2 < 1 


(2.16) 


and 


G ,tt  >0  are  given 
o o 


Then 


Proposition  1 


(>  • 


where 


< 1} 


(2.17) 


oOR“)xR* 


4 . SOLUTION  OF  THE  ESTIMATION  P?.03LIM 


The  estimation  problem  can  now  be  solved  using  the  work  of  BERTSEKAS  [1) . It 
consists  of  recursively  estimating  the  sets  P^  , which  are  ellipsoids.  The  centre 
of  the  ellipsoid  is  the  desired  estimate.  These  results  are  summarised  in 


Prooosition  2 


ft 


K+l  - n2  i 1'Yk} 


1R  + R 


where  HK+1  satisfies 


k-°'1'2' 


and  P is  given  by 


W(1+HskH)  VL  Hsk 


( 2n  M L2Hskll  12  I I SJc  I I \ 

Jp.+L  s.  I - [P.+  — I Js,  s'.  [P. -I r — - — I ] ( 

\ k 11  k' 1 n k 2 n k k x 2 n > 

,V.  » .♦  4^n>V  ’ 


L2||s. 


k'  ‘ k 2 


<W  ‘ k * iVVk)sVV  u 


V‘Vl  ll=kH  I,sk> 


and  llvVkll  2 

V‘  L2||stl|  U+||skl|)s,k[P)t*  I„ls|[ 


Proposition  3 


-1 


If  = (o^)  exists  ard  GR+1  is  generated  by  (2.20)  - (2.22)  then 


Vi  ■ "k  + 1V!W1»lY.  _ 

VY'VVk1 


is  the  inverse  of  G , where 

•v*  i. 


(V  lV  llskll  I 1 V 

ak = — — n 


<•*'  'V  ^ w 


< 1 


■v  11ML  i„i  ‘k 


5.  NEW  ALGORITHMS  FOR  FUNCTION  g "IIII.'IIZATION 

Since  we  are  looking  for  an  estimate  of  theHessiar.  (or  inverse  of  the 
Hessian)it  is  desirable  that  our  estimates  are  symmetric.  This  suggests  the 
following  algorithm: 

(i)  Propagate  and  according  to  (2.23)  and  (2.20) 

(ii)  Symmetrize  to  obtain  H* 

(iii)  Find  the  closest  approximation  to  so  that  the  secant 
equation  = Xz^  is  satisfied. 

(iv)  The  new  step  is  computed  according  to  Powell's  cog- leg 
strategy  (cf.  POWELL  [3]) 

We  now  present  a number  of  convergence  results  corresponding  to  the  use 
of  different  estimates  for  the  Hemian. 

Suppose  we  update  P^  and  according  to  (2.20)  and  (2. 23) -{2. 25)  with 
Po=a2I.  The  new  step  is  chosen  according  to  the  formula 


s^  = -H^g^  , and  let  us  update  11^  according  to 


(5.1) 


<Hlskll>[Vl|skl|ln 


l2V1J  dkd> 


("o  = ° 


We  then  have: 


1*1 

' d.  I 


(s'  dJ 
k k 


(5.2) 


Lemma  5.1 


nk  > Pk  > 0 Vk  _>  0 . 


(5.3) 


We  know,  that  if  G^  is  non-singular  then 


G - = G.  + (zk~Gk  Sk)d'k  , k=0,l,2, . . . 

•v 


(5.4) 


We  can  then  show 


Lemma  5.2 


There  exists  a y>0,  such  that 


I I Gv-Gv  1 1 2 - kl°* 


(5.5) 


These  ideas  enable  us  to  prove  the  following  basic  convergence  theorem: 


/ 


I 


Theorem  5.3 

Let  g:  R -*■  Rn  be  differentiable  in  an  open  convex  neighborhood  D of  x*, 
where  x*  satisfies  g(x*)=0  and  we  also  have  Dg(x*)  = S(x*)  is  non-singular.  Let 
us  suppose  that  G(*)  satisfies 

| |G(x)-G(y)  J | £ L|  |x-y|  | , Vx,yeD.  (5.6) 

Then  for  each  Y > 0,  re [0,1] , 36=5 (Y,r) , e«e(Y,r)  such  that  if  I lx  -x*||  < 6 
and  | | G0~Go | |£Y0*  Oe[0,e] , then,  the  sequence 

*K+1  ‘ % ' ,Gkr\  IS.l) 

converges  to  x*. 

Moreover 

1 1 xJc+1-x*  I I £Y||xk-x*||  and  the  sequence  (5.8) 

A A 

(|  |gJ  | ) and  ( | | IL  [ | ) are  uniformly  bounded. 

k k=0 , 1 , . . . K k=0, 1, . . . 

Theorem  5.3  shows  that  we  obtain  linear  convergence.  One  can  show  that  the 
convergence  is  actually  super linear. 

So  far  we  have  constructed  an  algorithm  which  uses  the  output  of  the  filter 
directly.  As  we  have  previously  remarked  it  would  be  desirable  to  "symmetrize’' 
the  estimate  and  use  this  as  in  the  algorithm.  It  can  be  shown  that  an  algorithm 
using  the  symmetrized  estimate  converges  linearly  under  the  same  hypotheses  as 
that  of  Theorem  5.3.  However  a proof  of  convergence  of  the  algorithm  when 
the  estimates  are  also  chosen  to  satisfy  the  secant  equation  is  at  present 
not  available.  The  details  of  the  proof  of  the  various  results,  presented  in 
this  paper  will  appear  elsewhere  [cf.  MITTER-TOLDALAGI  [2)  ]. 
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